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Consider a graph with a set of vertices and oriented edges con- 
necting pairs of vertices. Each vertex is associated with a random vari- 
able and these are assumed to be independent. In this setting, suppose 
we wish to solve the following hypothesis testing problem: under the 
null, the random variables have common distribution N(0, 1) while 
under the alternative, there is an unknown path along which random 
variables have distribution N(p,,l), /i > 0, and distribution iV(0, 1) 
away from it. For which values of the mean shift fi can one reliably 
detect and for which values is this impossible? 

Consider, for example, the usual regular lattice with vertices of 
the form 

{(i, j) : < i, — i < j < i and j has the parity of i} 

and oriented edges (i, j) — > (i + l,j + s), where s — ±1. We show that 
for paths of length m starting at the origin, the hypotheses become 
distinguishable (in a minimax sense) if /i m 3> 1/ Vl°g m , while they 
are not if /i m C 1/ log m. We derive equivalent results in a Bayesian 
setting where one assumes that all paths are equally likely; there, the 
asymptotic threshold is /i m ~ m _1//4 . 

We obtain corresponding results for trees (where the threshold is 
of order 1 and independent of the size of the tree), for distributions 
other than the Gaussian and for other graphs. The concept of the 
predictability profile, first introduced by Benjamini, Pemantle and 
Peres, plays a crucial role in our analysis. 
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1. Introduction. This paper discusses the model problem of detecting 
whether or not there is a chain of connected nodes in a given network which 
exhibit an "unusual behavior." Suppose we are given a graph G with vertex 
set V and a random variable X v attached to each node v € V. In that sense, 
this is a graph-indexed process. We observe a realization of this process 
and wish to know whether all the variables at the nodes have the same 
behavior in the sense that they are all sampled from a common distribution 
Fq, or whether there is a path in the network, that is, a chain of consecutive 
nodes connected by edges, along which the variables at the nodes have a 
different distribution F\ . In other words, can one tell whether hidden in the 
background noise, there is a chain of nodes that stand out? 

Suppose, for example, that Fq is the standard normal distribution, whereas 
F\ is a normal distribution with mean 0.1 and variance 1. In a situation 
where the number of nodes along the path we wish to detect is comparably 
small, the largest values of X v are typically off this path. Can we reliably 
detect the existence of such a path? More generally, how subtle an effect 
can we detect? In this paper, we attempt to provide quantitative answers to 
such questions by investigating asymptotic detection thresholds — values of 
the mean shift at which detection is possible and values at which detection 
by any method whatsoever is impossible. 

Detection thresholds depend, of course, on the type of graphs under con- 
sideration and we propose the study of two representative graphs which are, 
in some sense, far from each other, as well as emblematic — regular lattices 
and trees. We introduce them next. Later in the paper, we will also consider 
other graphs. 

• Regular lattice in dimension 2. Our first graph is a regular lattice with 
nodes 

= {(hj) '■ < i < m — 1, — i < j < i and j has the parity of i} 

and with oriented edges (i,j) — > (£+ 1, j + s), where s = ±1. We call (0,0) 
the origin of the graph. Here and below, we use the subscript m in V m 
to remind the reader of the radius of the graph. A path in the graph is 
represented in Figure 1. 

• Complete binary tree. Our second model is the oriented regular binary 
tree. The nodes in the tree are of the form 

V m = { (i , j ) : < i < m - 1 , < j < 2 i } . 

and it has oriented edges (i,j) — > + 2j + s), where sG {0,1}. Again, we 
call (0, 0) the origin of the graph and the subscript m indicates the radius 
of the graph (i.e., the depth of tree). A path in the tree is represented in 
Figure 2. 
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Note that even though the numbers of paths of length m in both graphs 
are the same, the numbers of nodes are considerably different — about m 2 /2 
for the lattice and 2 m for the binary tree. 




m 

Fig. 1. Representation of a path (in red) in the regular lattice. 



Fig. 2. Representation of a path (in red) in the binary tree. 
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We denote by V m the set of paths in the graph starting at the origin and 
of length m. (In this paper, we define the length of a path to be the number 
of vertices the path visits.) We attach a random variable X v to each node v 
in the graph. We observe {X v : v 6 V} and consider the following hypothesis 
testing problem: 

• Under H , all the X v 's are i.i.d. N(0, 1). 

• Under Hi jm , all the X v 's are independent; there is an unknown path 
p £ V m along which the X v 's are i.i.d. N(/j, m , 1), fi m > 0, while they are 
i.i.d. N(0, 1) away from the path. 

In plain English, we would like to know whether there is a path along which 
the mean is elevated. 

1.1. Motivation. While this paper is mainly concerned with the study of 
fundamental detection limits, our problem is in fact motivated by applica- 
tions in various fields, especially in the area of signal detection. 

Suppose we are given very noisy data of the form 



where (Si) are sampled values of a signal of interest and (zi) is a noise term. 
Based on the observations (yi), one would like to decide whether or not a 
signal is hiding in the noise. That is, we would like to test whether S = or 
not. Suppose, further, that the signal is completely unknown and does not 
depend on a small number of parameters. In image processing, the signal S 
might be the indicator function of a general shape we wish to detect or a 
curve embedded in a two-dimensional pixel array [3]. In signal processing, 
the signal may be a chirp, a high-frequency wave with unknown and rapidly 
changing oscillatory patterns [10]. 

In these situations, we cannot hope to generate a family of candidate sig- 
nals that would provide large correlations with the unknown signal as the 
number of such candidates would be exponentially large in the signal size. 
In response to this obstacle, recent papers [10, 13] have proposed a very 
different approach, in which the family of candidate signals actually corre- 
sponds to a path in a network. We briefly explain the main idea. In most 
situations, it is certainly possible to generate a family of templates (<f> v )v&v 
which provide good local correlations with the signal of interest, for exam- 
ple, over shorter time intervals. Any signal of interest could then be closely 
approximated by a chain of such templates. Here, a chain is a path in a 
graph G with nodes v E V indexed by our templates and rules for connect- 
ing templates, these rules possessing the following property: any consecutive 
sequence of templates in the graph must correspond to a meaningful signal; 
that is, a signal one might expect to observe (e.g., imagine connecting linear 
segments to approximate smooth curves). Now, calculate a Z-score for each 




Ui — Si + Zi 



n 
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template and denote it X v . For simplicity, assume that X v ~ N(fi\, 1) if the 
template matches the signal S locally and X v ~ N(fj,o, 1) otherwise. Assume 
Hi > fiQ. Then, the signal detection problem is this: is there a path along 
which the mean of the Z-scores is slightly elevated? 

To make things a little more concrete, suppose the unknown signal S(t) 
is a chirp of the general form A(t)exp(i\ip(t)), where A(t) is a smooth am- 
plitude, (p(t) is a smooth phase function and A is a large base frequency. 
Roughly speaking, a chirp is an oscillatory signal with "instantaneous fre- 
quency" given by the derivative of the phase, that is, \<p'(t). Here, one might 
use as templates chirplets of the form 4> v (t) oc l[ v (t) exjp(i(a v t 2 /2 + b v t)) 
which are supported on the time interval I v and assume the linear in- 
stantaneous frequency a v t + b v . Such templates provide a local quadratic 
approximation of the unknown phase function \tp(t) (or a local linear ap- 
proximation of the unknown instantaneous frequency) and can exhibit high 
correlations with the unknown signal, provided that the discretization of the 
chirplet parameters is sufficiently fine. The chirplet graph [10] then connects 
pairs of chirplets supported on contiguous time intervals by imposing a cer- 
tain kind of continuity of the instantaneous frequency in such a way that a 
path represents a chirping signal with a piecewise linear instantaneous fre- 
quency which obeys a prescribed regularity criterion. Given the data vector 
y (1.1), one would then compute all the chirplet coefficients X v = {y,<p v ) of 
y. Testing whether there is signal or not amounts to testing whether all the 
node variables X v in the chirplet graph have mean or whether there is a 
path along which the mean is nonzero (the constraint that all possible paths 
start at a given vertex corresponds to the constraint that if a signal exists, 
its instantaneous frequency at time is known). 

Although the signal detection problem motivates the theoretical study 
presented in this paper, the problem of detecting a path in a network seems 
to represent a fundamental abstraction as many modern statistical detection 
problems can reasonably be formulated in this way. Indeed, it is very easy to 
imagine that one has available a number of measurements about variables 
related through a graphical model and that one wishes to detect whether 
there is a sequence of connected nodes which exhibit a peculiar behavior. 
We give one example to stimulate the reader's imagination. In [22], water 
quality in a network of streams is assessed by performing a chemical analysis 
at various locations along the streams. As a result, some locations are marked 
as problematic. We may view the set of all tested locations as nodes and 
connect pairs of adjacent nodes located on the same stream, thereby creating 
a tree (although not a regular tree) , with the root corresponding to the point 
which is the most downstream. We then assign to each node the value 1 
or 0, according to whether the location is problematic or not. A possible 
model would assume that the variables are Bernoulli, taking the value 1 
with probability equal to po when the location is normal and pi when it is 
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anomalous. One can then imagine that one would like to detect a path (or 
a family of paths) upstream of a certain sensitive location, in order to trace 
the existence of a polluter, or look for the existence of an anomalous path 
upstream from the root of the system; see [22]. Note that here, one could 
also be interested in detecting whether or not there is a family of anomalous 
paths, as opposed to just one such path. Examples of this kind truly abound; 
for example, one could imagine detecting atypical gene behaviors in a given 
gene network, and so on. 

1.2. A quick look at the results. The optimal detection threshold dis- 
cussed above is the minimum value of [J, = /i m which allows us to reliably 
tell whether or not there is a path which does not follow the null distribu- 
tion. This value depends on the criterion used for judging the quality of the 
decision rule, and statistical decision theory essentially offers two paradigms: 
the Bayesian and the minimax approach. We study them both. 

Consider the minimax paradigm first. Recall that a test T m is a {0, 1}- 
valued, measurable function of the collection (X„)„ g y. The minimax risk of 
a test T m is defined as 

(1.2) j(T m ) = P(Type I) + sup Pi, p (Type II). 

Throughout, we write Po for the law of (X v ) under Hq and Pip for the 
law of the same variables under H\ )Tn with path p £ V m . With this notation, 
Type I and II are shorthand for errors of Type I and II. In longhand, 

P(Type I) = P (T m = 1), Pi, p (Type II) = P liP (T m = 0). 

We say that a sequence of tests (T m ) is asymptotically powerful if 

lim 7(T m ) = 

m— >oo 

and asymptotically powerless if 

liminf j(T m ) > 1. 

m— >oo 

When there exists an asymptotically powerful sequence of tests, we say that 
reliable detection is possible; when all sequences of tests are asymptotically 
powerless, we say that detection is (essentially) impossible. 

1.2.1. The regular lattice. We first consider the regular lattice in dimen- 
sion 2. 

Theorem 1.1. Consider the regular lattice in dimension 2. Suppose that 
H m (\ogm) l l 2 — > oo as m — > oo. That then is a sequence of tests which is 
asymptotically powerful. On the other hand, suppose that /i m logm(loglogm) 1 / 2 
as m — > 0. Every sequence of tests is then asymptotically powerless. 
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Theorem 1.1 states that one can detect a path as long as Li m 3> (logm) -1 / 2 , 
while this is impossible if Li m < (logm)~( 1+e ) for each e > 0, provided that m 
is sufficiently large. The reader will note the discrepancy between the lower 
and the upper bound, which we will comment on in the concluding section. 

It turns out that the detection level is radically different in a Bayesian 
framework where one assumes that all paths are equally likely. For a prior 
7r on V m , namely on paths of length m, the corresponding risk of a test T m 
is now defined as 

(1.3) llT (T m ) = P(Type I) + E w P ljP (Type II), 

where E,,- stands for the expectation over the prior path distribution, namely, 
when the path p is drawn according to ir. We adopt the same terminology 
as before and say that (T m ) is asymptotically powerful if jn(T m ) — ► and 
powerless if liminf 7 7r (T m ) > 1. The Bayes test associated with 7r is, of course, 
optimal here. The following theorem shows that under the uniform prior on 
paths, the optimal Bayesian detectability threshold is about m -1 / 4 . 

Theorem 1.2. Consider the regular lattice in dimension 2 and assume 
the uniform prior on paths. If Li m m l / 4: — > oo as m — > oo, then the Bayes test 
is asymptotically powerful. Conversely, if Li m fn l / A — > as m — > 0, then the 
Bayes test is asymptotically powerless. 

Roughly speaking, if the anomalous path is chosen uniformly at random, 
one can asymptotically detect it as long as the intensity along the path 
exceeds m _1//4 , while no method whatsoever can detect below this level. 

Both results indicate that it is possible to detect an anomalous path event 
when Li m — ► (sufficiently slowly) . Note that while one can certainly reliably 
detect in such circumstances, it may be impossible to tell which sequence of 
nodes the anomalous path is traversing. This is an example of a situation 
where detection is possible, but estimation may not be. 

1.2.2. The binary tree. We are now interested in the complete binary 
tree. 

Theorem 1.3. If Li m = fj, > v / 21og2 ; then there is a sequence of tests 
that is asymptotically powerful. On the other hand, if Li m = [X < y / 2Tog~2 ; 
then there is no sequence of tests that is asymptotically powerful. Moreover, if 
fJ>m ^0 as m — > oo , then every sequence of tests is asymptotically powerless. 

Notice that there is no sharp threshold phenomenon here, in the sense that 
the minimax risk does not converge to 1 if fi m = fx < \/21og2. For example, 
the risk of the test which rejects the null hypothesis for large values of the 
variable at the root node is bounded away from 1 for any ll > 0. 
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For any graph, and under the normal model, consider the generalized 
likelihood ratio test (GLRT) which is the test rejecting the null for large 
values of M m := max{X p :p £ P m }, where X p is the sum of the node variables 
along the path p: 



The proof of Theorem 1.3 then shows that for the binary tree, the GLRT 
achieves the minimax threshold in that it has asymptotically full power when 
fj, > \/21og2. In this sense, the GLRT rivals the Bayes test under the uniform 
prior on paths, which, by symmetry, is minimax. 

1.3. Innovations and related work. In the regular graph model, the num- 
ber of variables needed to describe the path is m, while the total number 
of nodes or observations is about m 2 /2. Hence, the topic of this paper fits 
into the broad framework of nonpar ametric detection as the object we wish 
to detect is simply too complex to be reduced to a small number of pa- 
rameters. Because the theory and practice of detection have been centered 
around parametric models in which the generalized likelihood ratio test has 
played a crucial role (see the literature on scan statistics, matched filters 
and deformable templates, to name a few equivalent terms used in vari- 
ous fields of science and engineering [2, 16, 20, 24]), methods and results 
for nonparametric detection are comparably underdeveloped. Against this 
background, we will first provide some evidence showing that the generalized 
likelihood ratio test does not perform very well in our nonparametric set-up. 
Our work also differs from the important literature on nonparametric detec- 
tion in that it does not assume that the unknown object we wish to detect 
lies in a traditional smoothness class, such as Sobolev or Besov classes, or 
belongs to an ^p-ball or some related geometric body; see the book by Ing- 
ster and Suslina [19] and the multiple references therein. In fact, our model, 
techniques and results have nothing to do with this literature and hence 
our paper contributes to developing the important area of nonparametric 
detection in what appears to be a new direction. In fact, we are not famil- 
iar with statistical theory posing a problem as a graph detection problem 
and giving precise quantitative bounds. It has come to our attention, how- 
ever, that Berger and Peres have very recently considered problems which 
are mathematically closely related to our framework but with a different 
motivation. 

Our paper also has some connections with the theory and practice of 
multiple hypothesis testing. Indeed, we are interested in situations where 
testing at each node separately offers little or no power so that we need to 
combine information from different nodes. Because the anomalous nodes are 
located on a path, the search naturally involves testing over paths. There 



(1.4) 
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are many such paths, however, and in this sense, our problem resembles that 
of testing many hypotheses (one hypothesis test would be whether the mean 
along a specified path is zero or not). 

1.4. Organization of the paper. The paper is organized as follows. In Sec- 
tion 2, we study the detection problems over the regular lattice in dimension 
2 and prove our results about the minimax and Bayesian detection thresh- 
olds, namely, Theorems 1.1 and 1.2. In Section 3, we prove the detection 
thresholds for the binary tree. In Section 4, we extend our results to expo- 
nential distributions at the nodes and in Section 5, to other distributions and 
other graphs. In Section 6, we report on numerical simulations which com- 
plement our theoretical study. Finally, we conclude with Section 7, where 
we comment on our findings and discuss open problems. 

2. The regular lattice. Throughout, for positive sequences (a m ), (bm), 
we write a m x b m if the ratio a m /b m is bounded away from zero and infinity. 
Also, we occasionally drop subscripts to lighten the notation, wherever there 
is no ambiguity. 

2.1. Bayesian detection. We assume the uniform distribution over all 
paths, denoted by n. Equivalently, the distribution of the unknown path is 
that of an oriented symmetric random walk. We write Ptt(-) =E 7r Pi )P (-). 
As is well known, the test minimizing the risk (1.3) is the Neyman— Pearson 
test which rejects the null if and only if the likelihood ratio L m (X) = 
dP n (X)/dPo(X) exceeds 1 (the subscript m refers here to the size of the 
problem). Here, the likelihood ratio is given by 

(2.1) L m {X) = c T {m ~^ e^"" 1 ^ 2 / 2 , 

pev m 

where X p is defined in (1.4). Although L m (X) is an average over an exponen- 
tially large number of paths so that, at first sight, calculating this quantity 
may seem practically impossible, there is a recurrence relation which ac- 
tually gives an algorithm for computing the likelihood ratio in a number 
of operations which is proportional to the number of nodes; see Section 6 
for details. Note that the likelihood ratio L m (X) is closely related to the 
partition function of models of random polymers; see [11]. 

2.1.1. Proof of Theorem 1.2: upper bound. Assume //mm 1 / 4 — ► oo. This 
implies the existence of a sequence of real numbers (h m ) tending to infinity 
and such that \x m m m — ► oo. Define S(h m ) as the set of nodes obeying 

S{h m ) = {(i,j) E V m : \j\ < h m ^/m}. 
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In other words, S(h m ) is the intersection of V m with a strip of width h m \/m. 
Define T m , the sum of the variables in the strip, 

(2.2) T m = ^ Xij, 

(i,j)eS(h m ) 

and consider the test rejecting for appropriate large values of T m (determined 
below). With the assumption that h m is going to infinity, the oriented sym- 
metric random walk (i, <Si)o<i<m-i is contained in S{h m ) with probability 
approaching 1. That is, if we define the event 



A m = \ max \Si\ < h m y/m>, 

{0<i<m— 1 J 



then 

(2.3) lim P(j41) = 0. 

m — >oo 

To see this, use Doob inequality for martingales to get 



, ,,E|S m _i| n V E S m -i 2 2 
max > h m y/m < 2— -==- < 2 — ■= — < 



K o<i<m-\ J h m ym ' h m y/m h m 

Let n m be the number of nodes in S(h m ) and note that n m = h m rr?l 2 (\ + 
o(l)). Under H , 

T m ~iV(0,n m ), 
while under Hi, conditionally on the event A m , we have 

T m ~ N (vn\i m , n m j . 
It then follows from (2.3) and the fact that 

/immn" 1 ^ 2 x /i m ,m 1 / 4 /i~ 1//2 — > oo as m — ► oo 
that the test with rejection region \T m \ > m/x m /2 obeys 

lim Po(Type I) = and lim P^Type II) = 0. 

m— >oo m—>oo 

That is, the test based on T m is asymptotically powerful. This completes 
the proof of the first part of Theorem 1.2. 

2.1.2. Proof of Theorem 1.2: lower bound. It suffices to bound the Bayes 
risk from below. Note that 

(2.4) B m (ir):= inf 77r (T m ) = P (L m > 1) + P ff (L m < 1), 

all tests 

where L m is the Bayes test (or likelihood ratio) L m (X) = (d~P n /dPo)(X) 
and EoL m = 1. A standard calculation shows that 



(2.5) = i - Eo|L :- 11 > i - VEo(L r~ 1) " . 
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Therefore, to show that the two hypotheses are asymptotically indistin- 
guishable, it is sufficient to establish that, under the null, the variance of 
the likelihood ratio tends to zero. 

Another standard calculation shows that the variance of L m is given by 

(2.6) E (L m - l) 2 = E L 2 m - 1 = Ee^*™ - 1, 

where N m is the number of crossings of two independent paths of length m 
drawn from the prior. Hence, to derive a lower bound with this strategy, one 
needs to understand for which sequences (t m ) 

(2.7) lim M m (t m ) = 1, 

m— »oo 

where M m (t) := Ee m , t € M, is the moment generating function of N m . 

Here, the prior is the distribution of a symmetric random walk and the 
reader may know that EiV m x m 1 / 2 . Since 

Be t m N m > 1 + i mEA T m , 

this shows that it is necessary for the bound to be effective, to have t m m l l 2 — > 
or, equivalently, /z m m 1//4 — > 0. This is the correct asymptotic behavior, as 
we shall see next. 

Let (Si)i<i< m and (S'j)i<i< m be two independent symmetric random 
walks (note the slight change of the range of indices which is of no con- 
sequence whatsoever). Observe that {Si = S[} = {Si — S[ = 0} so that we 
equivalently need to study the number N m of returns to zero of the differ- 
ence process (Si — S'j)\<i< m , which is a Markov chain with the even integers 
as state space, and with jump probabilities to each neighbor equal to 1/4 
and probability to stay put equal to 1/2. Therefore, the joint law of the dif- 
ference process is that of (£>2t)i<i<m, where, again, S is a symmetric random 
walk (note the doubling of the interval together with the sampling at even 
times only). An immediate consequence is that 

P(iV m = k) = P(|{1 < i < m : S 2i = 0}| = k). 

The number of returns of a random walk to the origin has been well studied 
and we have from [14] and [15], Page 96 that 

(2-8) P W „ = t ) = ^( 2 '" m -*). 

The idea is now to develop a useful upper bound on the right side of (2.8) 
in order to estimate the moment generating function of N m . 

First, recall the classical refinement of the Stirling approximation to n! 
(see [15], pages 50-53), which states that 

^ n n+l/2 e -n + l/(12n + l) < n[ < ^^+1/2^+1/(12,*) _ 
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Substituting this approximation into (2.8) (when expanding the binomial 
coefficients) yields 

1 (1- k/2m) 2m - k+1 / 2 



P(N m = k)< 
(2.9) 



Hrm (1 - k/m) m ~ k+1 l 2 



_2 / 1 - k/2m ma (k/ m ) 

'jrm| 1 — k/m 



where 

g (t) = (1 - 1) log(l - t) - 2(1 - t/2) log(l - t/2), < t < 1. 

For i G (0,1), it holds that d 2 /dt 2 (g(t) - t 2 /A) > and, by convexity, the 
function g(t) — i 2 /4 is above its tangent at the origin. This tangent is the 
line y = since g(0) = g'(0) = 0, whence 

g(t)>t 2 /A Vie [0,1). 

Also, observe that (1 - t/2)/(l - t) < 1 + t for each t E [0,1/2]. Now, fix 
< e < 1/2. For k < em, we have (1 — k/2m)/(\ — k/m) < 1 + e, while for 

k < to, one always has */ — ^' ^ e then conclude that 



I±l e -fc 2 /4m fc < em, 
(2.10) P(N m = k)<< [ \™ 2 

e k / 4m , em < k < m. 



IT 

[The case k = m in the above estimate is checked directly rather than from 
(2.9).] 

The estimate (2.10) gives an upper bound on the moment generating 
function at t m since 

L em J /fT- m i 

V-L-re _-fc 2 /4m , .t m k 1 -fc 2 /4m 



M m (t m ) < £ e^^^e"*/ 4 ™ + J2 



e " l —=e 
s/Trm , — ' _ Jit 

It is clear that if t m — > as to — ► oo , then the second term of the right side 
goes to zero as m — > oo so we focus on the first term. Using the monotonicity 
in k of both e tmk and e~ k m / 4 , we have 

Y e t m k_± =e -k*/4m < + ™ & tm / e ™^« e -^ 2 /4 du 

y 7tto ^/vrm V vr Jo 

2 , 



eJm/2 p -u J /2 



+ 2e tm / e V2mtmU: -^=du 



y/nm Jo V2~k 

< + 2e mt ™ +tm P(Z > -V2m~t r 
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where Z is a standard normal random variable. It follows that if t m is chosen 
such that \pmt m — > as m — ► oo, then 

lim 2e m *™ + * m P(Z > -V2mt m ) = 1 

m— >oo 

and thus lim^^co M m (t m ) = 1. In conclusion, we have proven that 

(2.11) lim u m m 1/4 = =► lim inf B m (ir) > 1. 

m— >oo m— >oo 

This proves the second part of Theorem 1.2. 

2.2. Minimax detection. Just as in the Bayesian case, we first prove 
the upper bound by constructing a test which allows us to detect reliably 
when \L m decays slower than (log?n) -1 / 2 , and then study the lower minimax 
bound. 

2.2.1. Proof of Theorem 1.1: upper bound. Consider a simple test statis- 
tic of the form 

(2.12) T m = ^2 w i,j X i,ji w hj '■= w i = ^"TT- 

Hence, T m is a weighted sum of the values at the vertices of the graph. For 
convenience, we fix A m so that J2o<i< m ~i Wi = 1. Note that X m = (logm)~ 1 (l + 
o(l)). Under Hi, the mean of T m is given by \x m J2o<i< m -i w i = t L m and since 
the Xij's have identical variance under both Hq and Hi, we have 

Var (T m ) = Var liP (T m ) = ^ wjj 

= E (^ + iK 2 = E 7Ti =Am - 

0<i<m-l 0<i<m-l ~ l ~ 

Hence, 

T m ~h AT(0, A m ) and T m ~h 1jP N(fj, m , X m ), 

under any alternative. Consider the test which rejects the null whenever 
T m > /i m /2. The risk of this test is then equal to 

7 (T m ) = 2P(JV(0, 1) > ^ m A" 1/2 ) lim j(T m ) = 

when fi m Xm —> oo or, equivalently, when /i m yTogm — > oo. This proves the 
first part of Theorem 1.1. 
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2.2.2. Proof of Theorem 1.1: lower bound. The idea for obtaining a lower 
bound is to exhibit a prior on Hi which makes the Bayesian detection prob- 
lem as hard as possible. Consider a prior ir on H\ (here a distribution on 
the set of paths). Then, for all tests T m , 

where B m (ir) is the risk of the Bayes test, 

B m (Tr) = P (L m > 1) +P^(L m < 1). 

Our strategy is to construct a prior on the family of paths with a low pre- 
dictability profile, that is, a process whose location in the future is hard to 
predict from its current state and history. 

The predictability profile of a stochastic process. The concept of the pre- 
dictability profile was first introduced in [7]. 

Definition 2.1. The predictability profile of a stochastic process (S n ) n >i 
is defined by 

(2.13) PRE s (k) = snpP(S n+k — x|So, . . . , S n ), 
where the supremum is taken over all positions and histories. 

We will consider nearest-neighbor walks which are defined as processes 
with increments equal to ±1. Improving upon earlier results of Benjamini, 
Pemantle and Peres [7], Haggstrom and Mossel [17], Theorem 1.4, proved 
the following. 

Theorem 2.2. Suppose (fk)k>i is a decreasing positive sequence such 
that J2k>i fk/k < oo. There then exists a nearest-neighbor process starting 
at So = and obeying 

(2.14) PRE S (A:) < 

kfk 

for all k>l and some positive constant C . 

C. Hoffman proved in [18] that this is sharp in the sense that if (fk) is 
a decreasing positive sequence with J2k>i fk/k = oo, then the predictability 
profile (2.14) is impossible to achieve. 

In what follows, we will need a quantitative, finite version of Theorem 2.2. 
This is achieved by using a concrete prior, introduced in [17], which gives 
the predictability profile below. 
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Lemma 2.3 ([17], Proposition 3.1). Fix a sequence (a,j)j>o obeying 
J2j>o a j < 1- There then exists a nearest-neighbor process (S n ) n >o obeying 

20 

(2.15) PRE 5 (fc)<- for all k = 1,2,.... 

Ka L iog 2 ( fe / 2 )j 

The construction of the process and the proof of (2.15) may be found in 
the Appendix. Later, we will consider a prior on paths obeying (2.15) for 
suitable values of the sequence (oj)j>o- 

Predictability profiles and numbers of intersections. Hereafter, we consider 
stochastic processes with a finite horizon, that is, (S^cKKm-i- In the sequel, 
we will need to estimate the number of times two independent processes 
drawn from a prior with prescribed predictability profile cross each other. 
From the proof of [7], Lemma 3.1, we state the following 

Lemma 2.4. Let B be such that 

(2.16) J2 PREs(fc-B)<0<l- 

l<k<[m/B\ 

Then, for any sequence (v n )o< n <m-l and all k> 1, the distribution of the 
total number of intersections between (S n ) and (v n ) obeys 

(2.17) P(|5 n v\ > k) < B ■ 6 k l B , \S^v\:=\{n:S n = v n }\. 

We emphasize that the lemma is valid even if the sequence (f n )n>o does not 
determine a nearest-neighbor path. 

We now prove the lower bound in Theorem 1.1 by providing a lower bound 
for the Bayes risk B m (ir) for the prior ir given by Lemma 2.3, and with the 
sequence 



(2.18) a, = aj(m) 



. f l/(31og 2 m), j<log 2 m, 
\ 0, j > log 2 m. 

With the above choice, J2j>o a j ^ "ftojp^T < 1/2 for to > 4. 

As in the analysis of the Bayes risk [see (2. 5), (2. 6)], we employ the simple 
bound 



(2.19) B m {v) > 1 - VBo{L ™ l) \ E (L m - I) 2 = Ee*"« - 1, 

where L m is the likelihood ratio and N m is the number of crossings of two 
independent paths drawn from the prior ir. We compute 

J2 e^ k P(N m = k) = ]T e^ k P(N m = k) 
k>l Kk<K-l 



+ E e^ k [P(N m >k) — P(N m >k + l)} 



k>K 
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and, summing by parts, deduce that 

Ee^ Nm < e^x-V + [1 - e~^\ J2 P ( N m > k)e^ k . 

k>K 

With the choice (2.18), Lemma 2.3 gives 

PRE 5 (A;) < (601og 2 m)/fe, A; = 1,2,.... 
In particular, with B = B m = 120(logm) 2 /log2, we have 

\m/B m \ 

PRE 5 (fcfl TO ) < ±. 

k=l 

Applying Lemma 2.4 yields 

E L^ < e ^(^-i) + [i _ e -^]B rn J2 e^ k 2- k ' Bm 

k>K 

< fA{K-\) + ^ _ e -/&] flm _^ni_ am = e vL 2 -V B ™ < 1, 

where the last inequality is due to the fact that limm^oo \j? m B m = [since 
\x m (log m) (log log m) 1 / 2 =o(l)]. Further, 

liminf(- J B m loga m ) =log2 =^ — — <- ^oion \ - ClBm 

for some constant c\ and all m large. It follows that for some constant C2 
and all m large, 

B Ll<e^ K + c 2 ^Bie- K ^^ B -. 
Taking K = K m = 2(l? m log-B m )/log2 yields, for some constant C3, 

VoL 2 m < e c 3^B m iogB m + 0(fi 2 m B m ) -> 1 as xn — > 00. 
Together with (2.19), this concludes the proof of Theorem 1.2. 

3. The complete binary tree. In this section, we prove Theorem 1.3. 
For the upper bound, we show that the GLRT is asymptotically powerful if 
fJ-m = A* > \/21og2 and that a closely related test is asymptotically powerful 
if fi m = [i = \/21og2. For the lower bound, we study the likelihood ratio 
under the uniform prior on paths using a martingale approach. 

We start by considering the GLRT, which is based on M m = max{X p : p £ 
V m }, X p being defined in (1.4). We first show that under the null hypothesis, 
the GLRT obeys 



Po (M m > m\/1 log 2) -> 0, m->-oo. 
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This is, in fact, a simple application of Boole's inequality and a standard 
bound on the tail of the normal distribution 

1 <T' 2 / 2 

P (JV( „, 1)>t) <__ 

Indeed, 



P (M m > m^2hi2) < 2 m ~ 1 P (X p > 

(3-1) 

= 2 m ~ 1 P(iV(0, 1) > v/2mlog2) < 



4^/7rm log 2 



In fact, M m /m — > ^/21og2 a.s.; see [23], Section 3. Under any alternative 
Pi iP with /x > ^2 log 2, however, the GLRT obeys 

(3.2) Pi, P (Mm > m v / 21og2) -> 1, m -> oo. 

Indeed, if p is the path along which the mean is elevated, M m > X p and 
Xp/m is normally distributed with mean \i and variance 1/m. 
If p, = \J2 log 2, the same argument gives 

liminf Pi p (M m > mv/21og2) > i 

for each path p instead of (3.2). This is not quite enough to conclude that Hq 
and H\ can be separated with probability approaching 1. However, taking 
mfc = 2 k , from (3.1) and Borel-Cantelli, we have that 



Po(M mfc > mf t \/2log2 infinitely often) = 0, 
while standard estimates for random walks imply that 



Pi iP (M mfc > m/%-^/21og2 infinitely often) = 1 

for each p and [because the increments X p (mi : ) — X p (mi z _i) are exponen- 
tially mixing] even 



1 k 1 
liminf— > 1,,. . nr. — -, > -, Pi „-a.s. 

i=l 

Therefore, the test which computes, along the sequence m^, the number of 
times M mk > mk\ / 2 log 2, declaring if this number is less than k/4 and 
.Hi otherwise, has asymptotic full power. 

In conclusion, the GLRT (or its variant) has asymptotic full power if 
H> v / 2log2. 

We now turn to studying the likelihood ratio under the uniform prior tt 
on paths 

,2 , 



L m = 2~(m-l) e rt-V/2 
all paths p 
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and show that for fi < \J2 log 2, its risk 

B m (ir) = P {L m > 1) + PALm < 1) 

is bounded away from 0. A lower bound such as (2.5) would not suffice 
here since we want to recover the same threshold \J2 log 2. Instead, we turn 
to martingale methods. Such methods have been used for years (see, e.g., 
[8, 12]). Here, we follow the presentation found in [9]. 
A simple calculation shows that 



B m (7r) = l-E (l-i 



Let 1 17 1 denote the distance of a vertex v from the root. By Proposition 1 in 
[9], we know that under Hq, L m is a nonnegative martingale with respect 
to the filtration J-(X V :\v\ < m — 1), which converges pointwise to a finite, 
nonnegative random variable L^. Hence, by dominated convergence, 

lim B m (7r) = l-E (l-£oo)+. 

m— >oo 



Applying Proposition 2 in [9], we have that for <C yJ2 log 2, L m is uni- 
formly integrable and, therefore, EoLoo = 1. Hence, Po(£oo = 0) < 1 and, 
consequently, 

lim B m (ir) > 0. 

m— >oo 

Finally, we briefly argue that if /i m — ► 0, then every sequence of tests 
is asymptotically powerless. Here, it is enough to use the bound (2.5). It 
therefore suffices to prove that Varo(L m ) — ► as m — > oo. Just as in (2.19), 
Varo(L m ) = H^e^™^ — 1, where N m is the number of crossings between 
two random paths drawn from the prior tt. Here, P(N m = k) = 2~ k , 1 < k < 
m — 1, and P(-/V m = m) = 2~ m+1 . In short, the distribution of N m is that of 
a truncated geometric random variable with probability of success equal to 
1/2. Set T m = e Mm /2, which is less than 1 for m large. We compute 

VaM^ (2T "\ 1)(1 ~ C ^ ^ 

-L Tm A Tfn 

It is now clear that Varo(-L m ) - > when r m — > 1/2 or, equivalently, when 
A*m 0. This completes the proof of the theorem. 



4. Extension to exponential families. While the previous sections stud- 
ied the detection problem assuming a Gaussian distribution at the nodes of 
the graph, it is now time to emphasize that our results hold more generally. 
In fact, one can obtain similar conclusions for exponential models as well. 

Letting Fq be a distribution on the real line, we define Fg as the expo- 
nential family with associated density exp(#x — log <f(0)) with respect to Fq. 
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Note that by definition, <p(9) = 'Ep o [exp(0X)], where Ej? is the expecta- 
tion under the distribution Fq. We always assume that <p(0) < oo for in a 
neighborhood of 0; further restrictions are mentioned when needed. 

Under the null hypothesis, we assume that all the nodes are i.i.d. with 
distribution Fq, while under Hi >m , there is a path along which the nodes 
are i.i.d. with distribution Fg m , m > 0, and distribution Fq away from the 
path. The question is, of course, for what values of 6 m one can reliably detect 
this path. To connect this general set-up with the previously studied special 
case, set ip(9) = log(p(0) and recall that 

fi(9) :=E Fe X = ip'(0) and a 2 (0) := V&r Fe X = ip" (0) . 

With this notation, the mean shift is equal to 

fj,(9) - /i(O) = if/ (9) - tf(0) = t//'(O)(0 + o(0)). 

In other words, the value of a small mean shift is just about proportional to 
9. [In the Gaussian case, /j,(9) = 9 and log </?(#) = 9 2 /2.] 

4.1. The regular lattice with an exponential family at the nodes. We first 
consider the minimax detection problem, and extend Theorem 1.1. 



Theorem 4.1. Suppose that 9 m \J\ogm — ► oo as m^oo. There is then 
a sequence of tests which is asymptotically powerful. Conversely, suppose 
that 9 m log my/log log m — > as m— >0. Then, every sequence of tests (T m ) 
is asymptotically powerless. 

In summary, one can reliably detect a path as long as the mean shift 
n(9 m ) — A*(0) 3> (logm) -1 / 2 , while this is impossible if — ignoring the y/log log m 
factor — n(9 m ) — /i(0) <C (logm) -1 . 

As an example, consider the case where we have exponentially distributed 
random variables; under the null, the node variables are exponentially dis- 
tributed with mean 1, while under the alternative hypothesis, there is a 
path along which the node variables are exponentially distributed with mean 
1 + /j, m . Let Fq be the density of the exponential with mean 1. The density 
of an exponential random variable with mean 1 + fi with respect to Fq is 
given by 

(1 + fiy 1 exp( / ux/(l := exp(9x - logcp(9)), 



with 



m 1 



l + /i rv ' 1-0 

For this exponential model, one can reliably detect a mean shift [i m if it is 
significantly larger than (logm) -1 / 2 , while this is impossible if it is much 
smaller than (logm) -1 . 
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Proof of Theorem 4.1. The proof is similar to that of Theorem 1.1. 
For the upper bound, we consider the same statistic (2.12) as before, T m := 
J2(i,j)ev m w i,jXij, with exactly the same choice of weights. First, observe 
that for any path p from H\, the mean difference obeys 

E ltP (T m ) - E (T m ) = n(9 m ) - //(O). 

As for the variances, we have 

Var (T m ) = a 2 (0) £ wf d = X m a 2 (0) 

and for any alternative in H\, 

Var liP (T m )=a 2 (0) £ + [a 2 (6 rn ) - a 2 (0)} J2 w i 

(i,j)&V m 0<i<m-l 

= X m a 2 (0) + [a 2 (9 m ) - a 2 (0)]O(X 2 m ). 

Recall that X m = (logm)~ 1 (l + o(l)). Using Chebychev's inequality, we see 
that the probabilities of Type I and Type II errors go to zero as soon as 

1/2 

[fJ'i&m) ~ A i (0)]Am — ► oo as m — > oo. The first part of the theorem follows 
from fi(6 m ) — n(0) = 9 m Varp (X)(l + o(l)). That is, if the mean shift time 
\J\ogm increases to infinity, then the probability of each type of error goes 
to zero. 

For the lower bound, we consider the same prior distribution on the family 
of paths. For exponential models, the variance of the likelihood ratio L m is 
given by 



(4.1) Var (L m ) = E[A(0 m )^] - 1, A(0) = > 1 



where, again, N m is the number of crossings of two independent paths drawn 
from the prior, or 



Var (L m ) = Be a2{9m)Nm - 1 a{9) = yj\og\(0). 

This is the same expression as before [cf. (2.19)] and our previous analysis 
shows the existence of a prior with the property 



lim a(9 m ) logmylog logm = =>• lim Varo(L m ) = 0, 

m— *oo m—*oo 

which implies that the Bayes test is asymptotically powerless. It is now 
not difficult to see that for exponential models, X(9) = 1 + 0(|#| 2 ) so that 
a(9) =0(9) for 9 close to zero. As a consequence, 

lim 9 m log m^/log log m = ==^ lim a(9 m ) logmv^log logm = 0, 



m— >oo m— >oo 



which establishes the second part of the theorem. □ 

Not surprisingly, the same extension also holds in the Bayesian set-up. 
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Theorem 4.2. Consider the uniform prior on paths. Suppose that 
9 m m 1 ^ — > oo as m — > oo. The Bayes test is then asymptotically powerful. 
Conversely, if 9 m m 1 ^ — ► as m — > 0, then the Bayes risk tends to 1 and 
every sequence of tests (T m ) is asymptotically powerless. 

The proof follows that of Theorems 1.2 and 4.1. We omit the details. 

4.2. The tree with an exponential family at the nodes. Following [9], de- 
fine the function / as 

(4.2) /(0) = ±log(2p(0)). 

By Lemma 4 in [9], / either attains its unique minimum or / is strictly 
decreasing on (0, oo). In any case, we denote by 9* G (0, oo] the value where 
/ is minimum. 

Theorem 4.3. Assume that (f(9) < oo in a neighborhood of 9* . If 9 m = 
9 > 9* , then the GLRT is asymptotically powerful. If 9 m = 9 < 9* , then there 
does not exist any asymptotically powerful sequence of tests. If 9 m — > 0, then 
all sequences of tests are powerless. Finally, if 9 m = 9*, then there exists a 
sequence of asymptotically powerful tests. 

For exponential random variables, (p(9) = 1/(1 — 9) and we numerically 
compute 9* « .63. In terms of mean shift (see above), we have //(#*) — /i(0) = 
1/(1 — 9*) — 1 ~ 1.70. The mean difference along the unknown path must 
exceed approximately 1.70 to be reliably detectable. 

For Bernoulli random variables, Fq = Bernoulli (e 9 / (1 + e e )), the function 
/ is decreasing on (0,oo) and, therefore, 9* = oo. Theorem 4.3 then implies 
that no asymptotically powerful sequences of tests exist for testing fair coin 
tossing at the nodes versus biased coin tossing with parameter q G (1/2,1) 
along a path. Note that the situation drastically changes when q = 1 : in this 
case, the nodes with value 1 that are connected to the root node through a 
path of nodes of value 1 form a critical branching process (with an expected 
number of descendants at each node equal to 1) which, therefore, eventually 
dies out. Under Hi, however, there is always a path of length m starting 
from the origin and with all l's. Hence, the test that declares Hi if one finds 
such a path and Hq otherwise is asymptotically powerful. 

Proof of Theorem 4.3. The proof is very similar to that of Theo- 
rem 1.3. We start with the upper bound, assuming 9* < oo. Define £(t) = 
mi e>0 ip(9)e~ w . Note that 

f(*) = l/2 inf(log(2y>(0))-0t) = O 




t=mff(9) = f(9*). 
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Because (p(0) < oo in a neighborhood of 9* , we can replace the estimate (3.1) 
by the Bahadur-Rao bound [4], which yields 

Po(M m > mC\l/2)) < 2 m ~ 1 P (X p > mC\l/2)) < 



in 



for some constant C. (In fact, under our assumptions, M m /m — > £ 1 (l/2) 
a.s., by the argument in [23], Section 3.) This estimate and (4.3) imply that 

(4.4) Po{M m >mf(e*))< 



m 

We now study the behavior of M m /m under H\. Let p be the path along 
which the nodes are sampled from the distribution Fq. The strong law of 
large numbers then shows that lmim^oo X p / 'm = Ep g X a.s. and, therefore, 

liminf — — > — (logo?(0)) a.s. 

The derivative obeys d/ d9 (log ip(9)) = if/(6)/tp(9) > f(9*) if and only if 9 > 
8* . This equivalence follows from the identity 

d/d8\og{ip(9)) - f(9*) = 9f'(9) + f(0) - f(9*). 

Since / is decreasing on (0,9*) and strictly increasing on (9*, oo), the right- 
hand side has the sign of 9 — 9* . This analysis shows that the GLRT has 
asymptotic full power if 9 > 9*, and the argument for handling 9 = 9* is the 
same as in the Gaussian case, using the full power of (4.4). 

The study of the likelihood ratio under the uniform prior over paths is 
identical to that in the Gaussian case, with the exception that when proving 
the uniform integrability of the martingale L m , we use Biggins's theorem (in 
the form given in [21] — noting the condition <p(9) < oo in a neighborhood of 
9*) instead of using Proposition 2 from [9] . [The latter proposition requires 
that <p(9) be finite for all 9 > 0, or at least for 9 = 29*.] □ 

5. Extension to other graphs. This section emphasizes that results are 
available for other graphs and, in particular, for the analog of the regular 
lattice in higher dimensions. 

• Regular lattice in dimension d! = d + 1 . This is the graph with vertex set 

V = {(h ji, ■ ■ ■ , 3d) :0 < < jk < i and j k has the parity of i} 

and oriented edges (i,ji, . . - , id) ->■ («' + 1, ji + si,...,j d + s d ), where s k = 
±1. 

Consider a distribution from the exponential family at the nodes and the 
uniform prior on paths. In this case, the likelihood ratio has been studied in 
dimension d + 1 — under the name of the partition function — in the context 
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of directed random polymers. Martingale methods work well in this context 
and the behavior of the likelihood ratio for d > 3 is similar to the behavior 
of the likelihood ratio for the tree that we studied in Section 3; see [11], 
Proposition 3.2.1. In particular, for d > 3, there are no asymptotically pow- 
erful sequences of tests if 9 m = 9 obeys \(0)pd < 1, where X(6) is defined 
as in (4.1) and pd is the return probability of a symmetric random walk in 
dimension d. (The results for d = 2 only imply that the Bayes risk tends 
to zero if 9 m = 9 > 0.) In contrast, the minimax risk does not go to zero 
here and this follows from the construction of a prior with low predictability 
profile. We give a general statement in Theorem 5.3. 

To establish a general result, we work with a connected graph (directed or 
undirected), with one vertex marked that we call the origin, and, as before, 
we let V be the set of self-avoiding paths starting at the origin and V m C V be 
the subset of paths of length m. Under the null hypothesis, all the nodes are 
i.i.d. Fq, while under the alternative, there is a path in V m along which the 
nodes are i.i.d. F±. We assume throughout that F\ is absolutely continuous 
with respect to Fq; otherwise, the detection problem becomes trivial. 

Definition 5.1. A distribution tt on V is said to have an exponential 
intersection tail with parameter r] £ (0, 1) if there exists C > such that if 
TV" is the number of crossings of two independent samples from tt, then 

P(iV > k) < C ■ r] k Vfc>l. 

The regular lattice with d > 2 (i.e., d' > 3) admits a measure on paths with 
an exponential intersection tail [7], Theorem 1.3. Note that a summable 
predictability profile implies an exponential intersection tail. 

Definition 5.2. Let L = dFi/dF be the likelihood ratio for testing Fi 
versus Fq at a single node. The Pearson x 2 -distance between Fq and F\ is 
defined as x 2 (Fo,Fi) = Varo(L). 

With these definitions, we have the following general statement. 

Theorem 5.3. Suppose that there is a distribution tt on V having an 
exponential intersection tail with parameter r/. Then, if x 2 {Fq, Fi) < rj^ 1 — 1, 
there are no asymptotically powerful sequences of tests. 

The proof does not require any argument that we have not already pre- 
sented, and is omitted. For exponential variables, ^(Fq^Fq) = \{9) — 1, 
where \{9) is defined as in (4.1) and, therefore, no asymptotically powerful 
sequences of tests exist if X(9)r] < 1. 

Theorem 5.3 provides a lower bound on the minimax threshold for reliable 
detection. For an upper bound, suppose, for example, that the variables are 
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exponentially distributed and assume that i^-Vm = 0{5 m ) for some positive 
constant 5; for instance, 5 = 2 d works for the regular lattice in dimension d + 
1. Application of Boole's inequality and the law of large numbers shows that 
under those assumptions, the GLRT is asymptotically powerful if £(t)5 > 1, 
where, again, £(t) = infe>o (f(6)e~ te . 

6. Numerical experiments. We now explore the empirical performance 
of some of the detection methods we proposed for the regular lattice. The 
variables at the nodes are independent Gaussians. To measure the perfor- 
mance, we fix the probability of Type I error at 5% and estimate the power 
or detection rate, that is, the probability of deciding in favor of the alter- 
native Hi when H\ is true. This power function was estimated at values of 
the mean shift jjl (the mean of the node variables along the path) at which 
this function is varying. 

6.1. Bayesian detection under the uniform prior. We first consider de- 
tection under the uniform prior on paths. We compare the performance of 
the Bayes test, the GLRT and the test based on the strip statistic which 
was used in the proof of the upper bound in Theorem 1.2. The Bayes test 
is optimal in this setting and we recall that the strip statistic was shown to 
achieve the optimal detection rate. This paper did not theoretically analyze 
the performance of the GLRT in this situation, however, and we would like 
to do so empirically. 

6.1.1. Computing the Bayes statistic. As emphasized earlier, there exists 
a rapid algorithm for calculating the Bayes statistic L m (X) [(2.1)]. Consider 
any node v = (i,j) (0 < i < m — 1 and j has the parity of i) and let V End (v ) 
be the set of paths starting at the root (0,0) and ending at the node v. Set 

Y{ v ):=2~ i V e ^ P -(i+i)/^ 2 /2_ 

With this notation, L m (X) is the sum of Y over all the terminal nodes v 
for which i = m — 1. Now, observe the recurrence 

(6.1) Y{v) = e^ 2 l* Y{v+)+ 2 Y{v ~\ 

where (v + ,v~) are the two predecessors of v in the graph, that is, the two 
nodes from which one can reach v in one step. [By convention, set Y{v^) = 
if is outside the grid.] This recurrence shows that one can compute the 
Bayes statistics in 0{m 2 ) flops. 

For each value of fi and m, then, we simulated the Bayes statistic under 
Hq and Hi using 2,000 realizations for each. Here and below, each realization 
uses a new path realization drawn from the uniform distribution. 
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6.1.2. Simulating the strip statistic. For a positive integer B, the strip 
statistic T m ^B is the sum of the random variables falling in the centered strip 
of length m and width 2B + 1 , 

T m ,B = X hr 

0<«<m-l j:\j\<ram(i,B) 

Under Hq, T mj s ~ N(0,n mj B) where n m ^s is the number of vertices in the 
strip, while under Hi (fixed path), T m .B ~ N(/j, • Rm t B,n m ^B), where R m ,B 
is the number of vertices inside the strip that the random path visits. There- 
fore, one can simulate T m ^ by taking one realization of R m ,Bi multiplying 
it by /i and adding an independent mean-zero Gaussian variable. 

It remains to choose the width of the strip. We ran simulations with 
B = v^Jm for v = 0.75, 1, 2, 3. Among these values, B = 2^fm gave the best 
performance (at least for the graph sizes we considered). Finally, for a fixed 
fi and m, we used 5,000 realizations of the test statistic to estimate the 
detection rate. 

6.1.3. Simulating the GLRT. The GLRT statistic rejects for large values 
of M m = m&x{X p :p G Vm}- This statistic can be calculated rapidly using 
dynamic programming; for example, Dijkstra's algorithm [1] has here a com- 
putational complexity proportional to the number of nodes. For each graph 
size, the threshold corresponding to a Type I error probability approximately 
equal to .05 and the detection rate for a fixed fi were based on 10,000 and 
1,000 realizations, respectively. 

6.1.4. Comparing the tests. To compare the three tests, one can estimate 
the value of the mean shift which gives a detection rate of about 95% from 
graphs plotting the detection rates versus \x (see Figure 6). Call this quantity 
^0.95- Table 1 shows //o.95 for the Bayes test, the test based on the strip 
statistic test and the GLRT for different graph sizes. As expected, the Bayes 
test outperforms the other two, but one needs to recall that those tests do 
not require information about the parameter fi, while the Bayes test does. 
Figure 3 shows a log-log plot of /xo.95 as a function of m, together with 
least-squares line fits. The slope of the line is —0.255 for the Bayes test and 
—0.246 for the strip test. Both of these values are quite close to the —1/4 
exponent one finds in Theorem 1.2. For the GLRT, the slope is about —0.16. 
This suggests that the strip statistic test might eventually outperform the 
GLRT for sufficiently large graphs. The fitted lines meet at approximately 
m = 2 20 ~ 10 6 , but it would be computationally extremely intensive to run 
simulations for graphs of this size. The point here is that these simulations 
suggest that the GLRT is only able to detect at /U « m -1 / 6 and, therefore, 
does not achieve the optimal detection rate under the uniform prior on paths. 
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Table 1 

Value of the mean shift giving a detection rate of about 95% when using the Bayes test, 
the strip statistic test with width B — and the GLRT (uniform prior on paths) — one 

can compute /x.95 for the strip statistic for large values of m since it is given analytically 



m 


1025 


2049 


4097 


8193 


16385 


/X0.95 (Bayes) 


0.37 


0.31 


0.26 






po.95 (strip) 


0.84 


0.69 


0.59 


0.51 


0.42 


M0.95 (GLRT) 


0.46 


0.40 


0.36 


0.33 





6.2. Minimax detection. We focus here on the increasing path p, where 
Pi= i, < i < m — 1, as we believe this path to be the most challenging for 
the GLRT. In this section, we compare the performance of the GLRT with 
the weighted average statistic test (WAS) defined in (2.12). 

Recall that the WAS is distributed as N(0, X m ) under Ho and as N(fi, \ m ) 
under Hi, regardless of the unknown path [A m ~ (logm)^ 1 ]. Thus, to achieve 
a power equal to 0.95 at the 5% significance level, we need fi > 2zo,Q5\ / X m , 
where zo.95 is the 95% standard normal quantile. Some power curves for the 
WAS are graphed in Figure 5. We use simulations to graph similar curves 




Fig. 3. Comparison of the Bayes test, the strip statistic test and the GLRT under the 
uniform prior. The plot shows the value ^10.95 of the mean shift for which a given test 
achieves a 95% detection rate when the rate of false alarm is set at 5% as a function of 
the graph size m (log-log scale). 
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Fig. 4. Comparison of the GLRT and the WAS when the anomalous path is the increasing 
path. The plot shows the value (io.95 of the mean shift for which a given test achieves a 
95% detection rate when the rate of false alarm is set at 5% as a function of the graph 
size m. 



Table 2 

Value of the mean shift giving a detection rate of about 95% when using the WAS test 
and the GLRT for detecting the increasing path — one can compute /io.95 for the WAS for 
large values of m since it is given analytically 



rn 


1025 


2049 


4097 


8193 


16385 


32769 


M0.95 (WAS) 


1.20 


1.15 


1.10 


1.06 


1.03 


0.99 


/xo.95 (GLRT) 


0.90 


0.89 


0.885 


0.88 







for the GLRT; see Figure 6. Each point is based on 1,000 realizations of the 
statistic. 

While the power curves for the WAS tend to translate to the left, this 
does not seem to be the case for the GLRT. This might indicate that the 
detection threshold for the GLRT does not tend to zero as m increases, just 
as in the case of the binary tree. 

7. Discussion. Our paper leaves a number of open questions and invites 
several refinements. We briefly discuss some of these. 
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Fig. 5. Detection rate curves for the WAS statistic with m = 1025, 2049, 4097, 8193, 
16385, 32769. As m increases, the curve moves to the left. The Type I error is set to 5% 
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7.1. Sharpening the minimax detectability threshold in the two-dimensional 
regular lattice. There is a gap between the upper and lower bounds in 
Theorem 1.1: the detection threshold for our estimator (2.12) is of order 
fi m ~ (logm) -1 / 2 , but the priors we constructed showed nondetectability 
only when fi m ~ (logm) -1 (ignoring loglog factors). We do not see how to 
improve our prior to yield significantly better bounds and it seems that in 
any case, explicit priors of this family — as constructed in [17], for example — 
will not yield a lower bound obeying fi m 3> (logm) -3 / 4 . It would be very 
interesting to understand this better and decide what is the actual rate of 
the detectability threshold. 

With this in mind, we would like to emphasize that the test (2.12) used 
to prove the upper bound in Theorem 1.1 does not use the "continuity" of 
the path, only that it is known to be in the grid. That is, the test detects 
any sequence of the form {(i,pi) : < i < m — 1} as long as (i,pi) is a vertex 
in the graph, provided, of course, that /J, m is of order (logm) -1 / 2 . In fact, 
(logm) -1 / 2 turns out to be the minimax detection threshold when the set of 
vertices with positive mean is any sequence (i,Pi) remaining in the grid. In- 
deed, the least favorable prior chooses the (pi) independently and uniformly 
at random in their respective range so that the number of crossings of two 
independent paths obeys 

N m = ^ 

l<i<m 

where the J^'s are independent with P(ij = 1) = l/i and P(Ij = 0) = 1 — 
The same argument as before shows that 

2 

E (L m -l) 2 = Ee^-l= [] fl + ^zT)-!, 

which is easily shown to converge to zero when /^(logm) 1 / 2 — > 0. 

7.2. Studying the GLRT on the two-dimensional regular lattice. The GLRT 
may not be anywhere near optimal in the minimax sense. A indication of 
that can be deduced from work of Baik and Rains in [5], Section 4.4, and [6]. 
In the language of the current paper, they deal with the following problem: 
consider directed paths in the grid 6Z 2 :j<i< m}. That is, start- 
ing from the origin (0,0), a path is a sequence of increments by 1 unit in 
the right or upward direction (this corresponds to a rotation of the regular 
graph considered in Theorem 1.1, with its lower half erased). Under Ho, all 
vertices are i.i.d. exponential random variables with parameter 1. Under Hi, 
the variables along the "diagonal path" [the path (0,0), (1,1), (2,2) and so 
on] are i.i.d. exponential with mean 1 + fj, (of course, in this situation, H\ is 
asymptotically distinguishable from Hq if fi > 0, but this is of no concern in 
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what follows). They consider the GLRT statistic M m , which consists of the 
maximum partial sums among all possible directed paths connecting (0, 0) 
to (m,m), and show that the limit distribution of (a properly rescaled ver- 
sion of) M m does not depend on \x as long as \i < 1 (this follows from the 
geometric case treated in [5], Section 4.4). This hints that in that partic- 
ular set-up, the GLRT is far from optimal since Section 2 shows that the 
minimax risk with respect to all possible directed path goes to zero for any 
fx > 0. (Note that, strictly speaking, since the mode of convergence in [5] is 
weak convergence and not total variation, the results there hint, but do not 
imply, that the GLRT is not optimal.) Recently, Beffara and Sidoravicius (in 
a yet unpublished work) have analyzed the GLRT for the model considered 
in Theorem 1.1 (with exponential random variables), and their results seem 
to imply that the threshold for the GLRT is of order o(l), in contrast with 
the case [5] treated by Baik and Rains. 

Also of interest would be to study the power of the GLRT with a uniform 
prior on paths, where we suspect that the GLRT does not achieve the optimal 
threshold. 

7.3. Unknown starting location. Throughout this paper, we assumed that 
under H\ , the unknown path starts at a known node (the origin) . The same 
question can also be posed when the starting location is not known. For con- 
creteness, consider the regular lattice as in Section 2 and allow the unknown 
path of length m/2 to start at any vertex in the collection {(i,j)}i=o ■ Does 
there exist an asymptotically powerful test (in the minimax sense) for some 
sequence jjL m — > 0? Similarly, we could also imagine having a square lattice 
V m = {(i,j)} with 0<i<m — 1, < j < 2m (j has the parity of i as before) 
and with edges (i,j) — > (i + l,j + s), where s = ±1 and j + s is understood 
modulo 2m. If we know the starting location (Q,j) of the unknown path 
of length m, then this is the model problem discussed in Section 2. But 
studying this problem when we do not know the starting vertex is also of 
interest. 

7.4. Further refinements. In this paper, we assumed that the node vari- 
ables are independent and identically distributed and, clearly, one could 
address similar testing problems in far more general set-ups. Interesting ex- 
tensions include situations in which the variables are correlated or in which 
the means along the unknown path are not all equal. Following up on the 
nonparametric signal detection problem, one could also imagine problems 
where the vector of means is not exactly sparse in the sense that it is zero 
away from the unknown path, but only rapidly decaying away from this 
path. 

While this paper focuses on asymptotic properties of the detection prob- 
lem, it is also of interest to develop test statistics with good finite sample 
size properties and we hope to report on our progress in a future publication. 
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7.5. Other work. While this paper was being written, N. Berger and 
Y. Peres described to us some of their own results, obtained independently, 
which address related problems and may answer some of the questions raised 
above. 

APPENDIX: PROOF OF LEMMA 2.3 

To construct a stochastic process obeying (2.15), we follow [17] and let 
S n be the sum S n = J2t=i h with P(ij = 1) = p$ and P(ij = —1) = 1 — pi. 
Here, the p^s are stochastic (random environment) and defined by 

Pi = l/2+p^ +p\ 2) + ■■-, 

where (p\ ), (p^), ■■■ , are independent processes. 

1. For each i and j, the distribution of p^ is uniform on [— aj,aj]. 

2. The value pf is constant in i for i = 1, . . . , 2 J . At time 2 J + 1, it switches 
to a new independent value, uniform on [— dj, dj], which is kept until time 
2 x 2 J , and so on. 

Note that we need 

(A.l) £%<l/2 

j>0 

for this to make sense so that the pi £ (0, 1). Finally, the Jj's are independent, 
conditioned on the random environment (pi)- 

With this in place, Haggstrom and Mossel in [17], Proposition 3.1 showed 
that there exists a nearest-neighbor process (S n ) obeying 

C 

(A.2) PRE 5 (A;)<- for all k = 1, 2, . . . , 

^«Uog 2 (fc/2)J 

where C = 4[d + 1], with C x = 2 m "a mk ■ P(Y < BY/2), m k = Llog 2 (fc/2)J 
and Y is a binomial random variable with 2 mfc trials and a probability of 
success equal to a mk . Since, for any binomial variable Y n ^ p ~ Bin(n,p), 

npP(Y n , p <np/2)< 4npY ^ <i, 

C\ < 4 and thus the constant C < 20. 

As discussed earlier, this remark is of importance to us since we have used 
a sequence (dj) that depends explicitly on m. 
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